Bitte erstellt wieder ein R Projekt und öffnet es
Erstellt ein R-Skript für den heutigen Workshop
# Im Datensatz sind Items aus acht verschiedenen Skalen
# Erstellt nun zunächst für jede Skala einen Vektor mit den Item-Namen im Datensatz
sc1_items <- c("sc1_1", "sc1_2_rev", "sc1_3", "sc1_4_rev") # self-concept (pre-test)
sc2_items <- c("sc2_1", "sc2_2_rev", "sc2_3", "sc2_4_rev") # self-concept (post-test)
int1_items <- c("int1_1", "int1_2", "int1_3", "int1_4") # interest (pre-test)
int2_items <- c("int2_1", "int2_2", "int2_3", "int2_4") # interest (post-test)
sco_ability_items <- c("SCO1", "SCO2", "SCO3", "SCO4", "SCO5_rev", "SCO6") # social comparison orientation ability
sco_opinion_items <- c("SCO7", "SCO8", "SCO9", "SCO10", "SCO11_rev") # social comparison orientation opinion
identification_items <- c("Ident1", "Ident2", "Ident3", "Ident4") # university identification
enjoyment_items <- c("End1", "End2_rev", "End3") # enjoyment of the task
# Diese Variablen können wir später zur Berechnung der Skalenmittelwerte nutzen# psych-Bibliothek laden
library(psych)
alpha(mydata[,sc1_items]) # entspricht: alpha(mydata[,c("sc1_1", "sc1_2_rev", "sc1_3", "sc1_4_rev")])
Reliability analysis
Call: alpha(x = mydata[, sc1_items])
raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
0.91 0.91 0.9 0.73 11 0.012 4.2 1.2 0.71
95% confidence boundaries
lower alpha upper
Feldt 0.88 0.91 0.93
Duhachek 0.88 0.91 0.93
Reliability if an item is dropped:
raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
sc1_1 0.86 0.87 0.82 0.68 6.5 0.018 0.00184 0.70
sc1_2_rev 0.90 0.91 0.88 0.77 9.9 0.014 0.00452 0.74
sc1_3 0.88 0.89 0.84 0.72 7.7 0.016 0.00043 0.71
sc1_4_rev 0.88 0.89 0.86 0.73 8.1 0.016 0.01097 0.71
Item statistics
n raw.r std.r r.cor r.drop mean sd
sc1_1 166 0.91 0.93 0.91 0.85 4.0 1.2
sc1_2_rev 166 0.86 0.86 0.78 0.75 4.1 1.5
sc1_3 166 0.88 0.90 0.86 0.80 4.1 1.3
sc1_4_rev 166 0.90 0.89 0.83 0.80 4.6 1.6
Non missing response frequency for each item
1 2 3 4 5 6 7 miss
sc1_1 0.01 0.12 0.19 0.27 0.33 0.08 0.01 0
sc1_2_rev 0.02 0.17 0.17 0.19 0.23 0.20 0.01 0
sc1_3 0.01 0.10 0.24 0.23 0.28 0.13 0.01 0
sc1_4_rev 0.01 0.14 0.13 0.17 0.20 0.23 0.11 0
Reliability analysis
Call: alpha(x = mydata[, sco_ability_items])
raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
0.83 0.84 0.83 0.46 5.1 0.02 4.1 1.2 0.45
95% confidence boundaries
lower alpha upper
Feldt 0.79 0.83 0.87
Duhachek 0.80 0.83 0.87
Reliability if an item is dropped:
raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
SCO1 0.85 0.85 0.84 0.54 5.8 0.018 0.012 0.54
SCO2 0.77 0.77 0.75 0.41 3.4 0.028 0.016 0.39
SCO3 0.80 0.80 0.79 0.45 4.1 0.024 0.022 0.44
SCO4 0.79 0.79 0.78 0.44 3.9 0.026 0.026 0.40
SCO5_rev 0.79 0.79 0.78 0.43 3.8 0.026 0.021 0.41
SCO6 0.82 0.82 0.81 0.48 4.7 0.022 0.028 0.54
Item statistics
n raw.r std.r r.cor r.drop mean sd
SCO1 166 0.56 0.56 0.41 0.38 3.8 1.6
SCO2 166 0.85 0.86 0.86 0.77 4.4 1.5
SCO3 166 0.75 0.76 0.70 0.63 4.7 1.5
SCO4 166 0.80 0.79 0.74 0.68 3.7 1.6
SCO5_rev 166 0.80 0.79 0.76 0.68 4.2 1.7
SCO6 166 0.68 0.68 0.58 0.53 3.8 1.6
Non missing response frequency for each item
1 2 3 4 5 6 7 miss
SCO1 0.07 0.21 0.16 0.19 0.20 0.14 0.02 0
SCO2 0.02 0.12 0.16 0.22 0.22 0.18 0.08 0
SCO3 0.02 0.09 0.08 0.19 0.31 0.20 0.11 0
SCO4 0.08 0.20 0.19 0.17 0.19 0.13 0.03 0
SCO5_rev 0.06 0.13 0.18 0.18 0.19 0.20 0.06 0
SCO6 0.06 0.19 0.19 0.20 0.18 0.14 0.03 0
Reliability analysis
Call: alpha(x = mydata[, sco_opinion_items])
raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
0.73 0.74 0.77 0.36 2.8 0.033 5.1 0.94 0.41
95% confidence boundaries
lower alpha upper
Feldt 0.66 0.73 0.79
Duhachek 0.66 0.73 0.79
Reliability if an item is dropped:
raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
SCO7 0.66 0.67 0.70 0.34 2.0 0.043 0.113 0.32
SCO8 0.59 0.60 0.59 0.27 1.5 0.053 0.053 0.27
SCO9 0.61 0.62 0.64 0.29 1.7 0.051 0.075 0.27
SCO10 0.66 0.67 0.71 0.33 2.0 0.045 0.093 0.27
SCO11_rev 0.84 0.83 0.82 0.56 5.1 0.020 0.017 0.59
Item statistics
n raw.r std.r r.cor r.drop mean sd
SCO7 165 0.72 0.74 0.648 0.551 5.8 1.2
SCO8 165 0.84 0.84 0.861 0.713 5.4 1.4
SCO9 165 0.82 0.81 0.799 0.665 5.0 1.5
SCO10 166 0.73 0.74 0.655 0.564 5.0 1.3
SCO11_rev 166 0.38 0.36 0.091 0.071 4.3 1.5
Non missing response frequency for each item
1 2 3 4 5 6 7 miss
SCO7 0.01 0.02 0.02 0.04 0.16 0.43 0.31 0.01
SCO8 0.01 0.05 0.03 0.11 0.22 0.38 0.19 0.01
SCO9 0.02 0.04 0.12 0.12 0.27 0.28 0.15 0.01
SCO10 0.00 0.04 0.08 0.18 0.28 0.31 0.10 0.00
SCO11_rev 0.04 0.08 0.16 0.28 0.25 0.13 0.07 0.00
# Wie beim letzten mal erstellen wir jetzt neue Spalten für die
# Skalenmittelwerte mit der rowMeans()-Funktion
mydata[,"sc1_mean"] <- rowMeans(mydata[,sc1_items], na.rm = T)
mydata[,"sc2_mean"] <- rowMeans(mydata[,sc2_items], na.rm = T)
mydata[,"int1_mean"] <- rowMeans(mydata[,int1_items], na.rm = T)
mydata[,"int2_mean"] <- rowMeans(mydata[,int2_items], na.rm = T)
mydata[,"sco_ability_mean"] <- rowMeans(mydata[,sco_ability_items], na.rm = T)
mydata[,"sco_opinion_mean"] <- rowMeans(mydata[,sco_opinion_items], na.rm = T)
mydata[,"identification_mean"] <- rowMeans(mydata[,identification_items], na.rm = T)
mydata[,"enjoyment_mean"] <- rowMeans(mydata[,enjoyment_items], na.rm = T)Das tidyverse ist eine Sammlung von Bibliotheken zur Datenanalyse
Es beinhaltet verschiedene Bibliotheken und Funktionen, die die Datenaufbereitung erleichtern, Tools zur Erstellung von Graphiken (ggplot) und vieles mehr
Mehr Informationen unter https://www.tidyverse.org/
Cheat Sheets sind eine graphische Aufbereitung verschiedener Funktionen von Bibliotheken
# Zur besseren Übersicht bietet es sich nach Berechnung der Skalenmittelwerte
# an einen neuen Datensatz zu erstellen, der die Einzelitems nicht beinhaltet
mydata_scales <- select(mydata, !c(all_of(c(int1_items, int2_items, sc1_items, sc2_items, sco_ability_items, sco_opinion_items, enjoyment_items, identification_items)), "sc1_2", "sc1_4", "sc2_2", "sc2_4", "SCO5", "SCO11", "End2"))# Das geht auch mit der pipe
mydata_scales <- mydata |>
select(!c(all_of(c(int1_items, int2_items, sc1_items, sc2_items, sco_ability_items, sco_opinion_items, enjoyment_items, identification_items)), "sc1_2", "sc1_4", "sc2_2", "sc2_4", "SCO5", "SCO11", "End2"))
# entspricht:
# select(mydata, !c(all_of(c(int1_items, int2_items, sc1_items, sc2_items, sco_ability_items, sco_opinion_items, enjoyment_items, identification_items)), "sc1_2", "sc1_4", "sc2_2", "sc2_4", "SCO5", "SCO11", "End2"))
# Die pipe nutzt die Variable vor der pipe |> als erstes Argument für die Funktion nach der pipe |>
# Teilweise wird auch %>% als pipe verwendet
# Für unsere heutigen Bedürfnisse sind beide pipes äquivalent
# %>% kommt aus der tidyverse-Bibliothek (bzw. aus magrittr), |> aus base R (keine Bibliothek nötig) vars n mean sd median trimmed mad min
UserID 1 166 1114.83 54.37 1115.50 1115.03 68.94 1020.00
rating1 2 166 2.51 1.27 2.00 2.40 1.48 1.00
rating2 3 166 2.98 0.86 3.00 2.97 1.48 1.00
run1 4 166 0.52 0.15 0.52 0.52 0.17 0.13
run2 5 166 0.72 0.16 0.73 0.72 0.15 0.30
age 6 166 22.48 5.65 20.00 21.31 1.48 18.00
gender 7 166 1.11 0.32 1.00 1.02 0.00 1.00
sozpos* 8 166 1.51 0.50 2.00 1.51 0.00 1.00
sc1_mean 9 166 4.20 1.25 4.25 4.24 1.48 1.50
sc2_mean 10 166 4.02 1.28 4.00 4.03 1.11 1.00
int1_mean 11 166 4.29 1.46 4.50 4.36 1.48 1.00
int2_mean 12 166 4.27 1.59 4.50 4.36 1.48 1.00
sco_ability_mean 13 166 4.08 1.17 4.17 4.10 1.24 1.50
sco_opinion_mean 14 166 5.12 0.94 5.20 5.21 0.89 1.60
identification_mean 15 166 4.64 1.00 4.75 4.69 1.11 1.75
enjoyment_mean 16 165 5.01 1.08 5.00 5.06 0.99 1.33
max range skew kurtosis se
UserID 1209.00 189.00 -0.02 -1.17 4.22
rating1 5.00 4.00 0.52 -0.88 0.10
rating2 5.00 4.00 0.05 -0.33 0.07
run1 0.90 0.77 0.05 -0.23 0.01
run2 1.00 0.70 -0.50 -0.41 0.01
age 46.00 28.00 1.95 3.52 0.44
gender 2.00 1.00 2.40 3.78 0.02
sozpos* 2.00 1.00 -0.02 -2.01 0.04
sc1_mean 7.00 5.50 -0.16 -0.92 0.10
sc2_mean 7.00 6.00 -0.16 -0.45 0.10
int1_mean 7.00 6.00 -0.41 -0.71 0.11
int2_mean 7.00 6.00 -0.45 -0.73 0.12
sco_ability_mean 6.83 5.33 -0.08 -0.74 0.09
sco_opinion_mean 7.00 5.40 -1.03 1.48 0.07
identification_mean 7.00 5.25 -0.45 0.14 0.08
enjoyment_mean 7.00 5.67 -0.53 0.12 0.08
# So können wir uns die deskriptiven Statistiken getrennt nach Gruppen ausgeben lassen
describeBy(mydata_scales ~ sozpos)
Descriptive statistics by group
sozpos: low social position
vars n mean sd median trimmed mad min max
UserID 1 82 1116.43 53.09 1121.00 1117.03 63.01 1021.00 1209.00
rating1 2 82 1.60 0.61 2.00 1.56 0.00 1.00 4.00
rating2 3 82 2.73 0.80 3.00 2.71 1.48 1.00 5.00
run1 4 82 0.52 0.14 0.50 0.52 0.15 0.13 0.80
run2 5 82 0.71 0.17 0.73 0.72 0.15 0.30 1.00
age 6 82 22.60 5.62 20.00 21.50 1.48 18.00 43.00
gender 7 82 1.12 0.33 1.00 1.03 0.00 1.00 2.00
sozpos 8 82 1.00 0.00 1.00 1.00 0.00 1.00 1.00
sc1_mean 9 82 4.12 1.26 4.00 4.15 1.48 1.75 7.00
sc2_mean 10 82 3.72 1.26 3.88 3.70 1.30 1.00 7.00
int1_mean 11 82 4.38 1.51 4.50 4.45 1.48 1.00 7.00
int2_mean 12 82 4.21 1.61 4.25 4.27 1.48 1.00 7.00
sco_ability_mean 13 82 4.02 1.14 4.17 4.05 1.24 1.50 6.17
sco_opinion_mean 14 82 5.10 0.89 5.20 5.19 0.89 2.60 6.80
identification_mean 15 82 4.75 1.03 4.88 4.80 0.93 2.00 7.00
enjoyment_mean 16 82 4.90 1.01 5.00 4.94 0.99 2.00 7.00
range skew kurtosis se
UserID 188.00 -0.12 -1.08 5.86
rating1 3.00 0.78 1.25 0.07
rating2 4.00 0.08 -0.09 0.09
run1 0.67 -0.11 -0.33 0.02
run2 0.70 -0.46 -0.42 0.02
age 25.00 1.73 2.45 0.62
gender 1.00 2.27 3.19 0.04
sozpos 0.00 NaN NaN 0.00
sc1_mean 5.25 -0.10 -1.03 0.14
sc2_mean 6.00 -0.02 -0.66 0.14
int1_mean 6.00 -0.40 -0.61 0.17
int2_mean 6.00 -0.30 -0.71 0.18
sco_ability_mean 4.67 -0.27 -0.86 0.13
sco_opinion_mean 4.20 -0.91 0.66 0.10
identification_mean 5.00 -0.49 -0.17 0.11
enjoyment_mean 5.00 -0.44 -0.23 0.11
------------------------------------------------------------
sozpos: high social position
vars n mean sd median trimmed mad min max
UserID 1 84 1113.27 55.86 1110.00 1113.09 70.42 1020.00 1208.00
rating1 2 84 3.40 1.11 3.50 3.43 0.74 1.00 5.00
rating2 3 84 3.21 0.85 3.00 3.22 1.48 1.00 5.00
run1 4 84 0.52 0.15 0.53 0.52 0.17 0.20 0.90
run2 5 84 0.72 0.15 0.73 0.73 0.15 0.33 0.97
age 6 84 22.37 5.71 20.00 21.13 2.97 18.00 46.00
gender 7 84 1.11 0.31 1.00 1.01 0.00 1.00 2.00
sozpos 8 84 2.00 0.00 2.00 2.00 0.00 2.00 2.00
sc1_mean 9 84 4.28 1.24 4.25 4.32 1.48 1.50 6.50
sc2_mean 10 84 4.31 1.24 4.25 4.35 1.11 1.00 7.00
int1_mean 11 84 4.21 1.41 4.50 4.28 1.48 1.00 6.50
int2_mean 12 84 4.32 1.58 4.75 4.44 1.48 1.00 7.00
sco_ability_mean 13 84 4.14 1.21 4.00 4.14 1.24 1.83 6.83
sco_opinion_mean 14 84 5.14 0.99 5.20 5.24 0.89 1.60 7.00
identification_mean 15 84 4.53 0.96 4.50 4.57 0.74 1.75 6.50
enjoyment_mean 16 83 5.11 1.13 5.33 5.17 0.99 1.33 7.00
range skew kurtosis se
UserID 188.00 0.08 -1.28 6.10
rating1 4.00 -0.21 -0.90 0.12
rating2 4.00 -0.07 -0.50 0.09
run1 0.70 0.16 -0.27 0.02
run2 0.63 -0.52 -0.50 0.02
age 28.00 2.12 4.38 0.62
gender 1.00 2.50 4.28 0.03
sozpos 0.00 NaN NaN 0.00
sc1_mean 5.00 -0.21 -0.85 0.13
sc2_mean 6.00 -0.30 -0.12 0.13
int1_mean 5.50 -0.45 -0.93 0.15
int2_mean 6.00 -0.60 -0.77 0.17
sco_ability_mean 5.00 0.05 -0.77 0.13
sco_opinion_mean 5.40 -1.10 1.86 0.11
identification_mean 4.75 -0.46 0.49 0.10
enjoyment_mean 5.67 -0.66 0.35 0.12
# Die Tilde zeigt uns an, dass es sich hier um eine Formel handelt
# Das ist eine von R häufig genutzte Art Zusammenhänge zwischen Variablen darzustellen
# Links von der Tilde (~) steht bzw. stehen die AV(s), rechts von der Tilde die UV(s)
# In diesem Fall sind alle Spalten die AV und die Spalte sozpos ist die UV# Alternativ könnten wir das auch so schreiben:
describeBy(mydata_scales, group = mydata_scales[,"sozpos"])
Descriptive statistics by group
group: low social position
vars n mean sd median trimmed mad min max
UserID 1 82 1116.43 53.09 1121.00 1117.03 63.01 1021.00 1209.00
rating1 2 82 1.60 0.61 2.00 1.56 0.00 1.00 4.00
rating2 3 82 2.73 0.80 3.00 2.71 1.48 1.00 5.00
run1 4 82 0.52 0.14 0.50 0.52 0.15 0.13 0.80
run2 5 82 0.71 0.17 0.73 0.72 0.15 0.30 1.00
age 6 82 22.60 5.62 20.00 21.50 1.48 18.00 43.00
gender 7 82 1.12 0.33 1.00 1.03 0.00 1.00 2.00
sozpos 8 82 1.00 0.00 1.00 1.00 0.00 1.00 1.00
sc1_mean 9 82 4.12 1.26 4.00 4.15 1.48 1.75 7.00
sc2_mean 10 82 3.72 1.26 3.88 3.70 1.30 1.00 7.00
int1_mean 11 82 4.38 1.51 4.50 4.45 1.48 1.00 7.00
int2_mean 12 82 4.21 1.61 4.25 4.27 1.48 1.00 7.00
sco_ability_mean 13 82 4.02 1.14 4.17 4.05 1.24 1.50 6.17
sco_opinion_mean 14 82 5.10 0.89 5.20 5.19 0.89 2.60 6.80
identification_mean 15 82 4.75 1.03 4.88 4.80 0.93 2.00 7.00
enjoyment_mean 16 82 4.90 1.01 5.00 4.94 0.99 2.00 7.00
range skew kurtosis se
UserID 188.00 -0.12 -1.08 5.86
rating1 3.00 0.78 1.25 0.07
rating2 4.00 0.08 -0.09 0.09
run1 0.67 -0.11 -0.33 0.02
run2 0.70 -0.46 -0.42 0.02
age 25.00 1.73 2.45 0.62
gender 1.00 2.27 3.19 0.04
sozpos 0.00 NaN NaN 0.00
sc1_mean 5.25 -0.10 -1.03 0.14
sc2_mean 6.00 -0.02 -0.66 0.14
int1_mean 6.00 -0.40 -0.61 0.17
int2_mean 6.00 -0.30 -0.71 0.18
sco_ability_mean 4.67 -0.27 -0.86 0.13
sco_opinion_mean 4.20 -0.91 0.66 0.10
identification_mean 5.00 -0.49 -0.17 0.11
enjoyment_mean 5.00 -0.44 -0.23 0.11
------------------------------------------------------------
group: high social position
vars n mean sd median trimmed mad min max
UserID 1 84 1113.27 55.86 1110.00 1113.09 70.42 1020.00 1208.00
rating1 2 84 3.40 1.11 3.50 3.43 0.74 1.00 5.00
rating2 3 84 3.21 0.85 3.00 3.22 1.48 1.00 5.00
run1 4 84 0.52 0.15 0.53 0.52 0.17 0.20 0.90
run2 5 84 0.72 0.15 0.73 0.73 0.15 0.33 0.97
age 6 84 22.37 5.71 20.00 21.13 2.97 18.00 46.00
gender 7 84 1.11 0.31 1.00 1.01 0.00 1.00 2.00
sozpos 8 84 2.00 0.00 2.00 2.00 0.00 2.00 2.00
sc1_mean 9 84 4.28 1.24 4.25 4.32 1.48 1.50 6.50
sc2_mean 10 84 4.31 1.24 4.25 4.35 1.11 1.00 7.00
int1_mean 11 84 4.21 1.41 4.50 4.28 1.48 1.00 6.50
int2_mean 12 84 4.32 1.58 4.75 4.44 1.48 1.00 7.00
sco_ability_mean 13 84 4.14 1.21 4.00 4.14 1.24 1.83 6.83
sco_opinion_mean 14 84 5.14 0.99 5.20 5.24 0.89 1.60 7.00
identification_mean 15 84 4.53 0.96 4.50 4.57 0.74 1.75 6.50
enjoyment_mean 16 83 5.11 1.13 5.33 5.17 0.99 1.33 7.00
range skew kurtosis se
UserID 188.00 0.08 -1.28 6.10
rating1 4.00 -0.21 -0.90 0.12
rating2 4.00 -0.07 -0.50 0.09
run1 0.70 0.16 -0.27 0.02
run2 0.63 -0.52 -0.50 0.02
age 28.00 2.12 4.38 0.62
gender 1.00 2.50 4.28 0.03
sozpos 0.00 NaN NaN 0.00
sc1_mean 5.00 -0.21 -0.85 0.13
sc2_mean 6.00 -0.30 -0.12 0.13
int1_mean 5.50 -0.45 -0.93 0.15
int2_mean 6.00 -0.60 -0.77 0.17
sco_ability_mean 5.00 0.05 -0.77 0.13
sco_opinion_mean 5.40 -1.10 1.86 0.11
identification_mean 4.75 -0.46 0.49 0.10
enjoyment_mean 5.67 -0.66 0.35 0.12
# Korrelationsvariablen festlegen
cor_vars <- c("sc1_mean", "sc2_mean", "int1_mean", "int2_mean", "sco_ability_mean", "sco_opinion_mean", "enjoyment_mean", "identification_mean")
# Korrelationsparamater für diese Variablen berechnen
corr.test(mydata_scales[,cor_vars])Call:corr.test(x = mydata_scales[, cor_vars])
Correlation matrix
sc1_mean sc2_mean int1_mean int2_mean sco_ability_mean
sc1_mean 1.00 0.86 0.42 0.41 0.03
sc2_mean 0.86 1.00 0.46 0.47 0.00
int1_mean 0.42 0.46 1.00 0.90 -0.01
int2_mean 0.41 0.47 0.90 1.00 -0.09
sco_ability_mean 0.03 0.00 -0.01 -0.09 1.00
sco_opinion_mean -0.05 -0.03 0.07 0.07 0.48
enjoyment_mean 0.24 0.25 0.56 0.58 0.12
identification_mean -0.01 0.00 0.15 0.13 0.15
sco_opinion_mean enjoyment_mean identification_mean
sc1_mean -0.05 0.24 -0.01
sc2_mean -0.03 0.25 0.00
int1_mean 0.07 0.56 0.15
int2_mean 0.07 0.58 0.13
sco_ability_mean 0.48 0.12 0.15
sco_opinion_mean 1.00 0.22 0.26
enjoyment_mean 0.22 1.00 0.19
identification_mean 0.26 0.19 1.00
Sample Size
sc1_mean sc2_mean int1_mean int2_mean sco_ability_mean
sc1_mean 166 166 166 166 166
sc2_mean 166 166 166 166 166
int1_mean 166 166 166 166 166
int2_mean 166 166 166 166 166
sco_ability_mean 166 166 166 166 166
sco_opinion_mean 166 166 166 166 166
enjoyment_mean 165 165 165 165 165
identification_mean 166 166 166 166 166
sco_opinion_mean enjoyment_mean identification_mean
sc1_mean 166 165 166
sc2_mean 166 165 166
int1_mean 166 165 166
int2_mean 166 165 166
sco_ability_mean 166 165 166
sco_opinion_mean 166 165 166
enjoyment_mean 165 165 165
identification_mean 166 165 166
Probability values (Entries above the diagonal are adjusted for multiple tests.)
sc1_mean sc2_mean int1_mean int2_mean sco_ability_mean
sc1_mean 0.00 0.00 0.00 0.00 1.00
sc2_mean 0.00 0.00 0.00 0.00 1.00
int1_mean 0.00 0.00 0.00 0.00 1.00
int2_mean 0.00 0.00 0.00 0.00 1.00
sco_ability_mean 0.72 0.99 0.90 0.27 0.00
sco_opinion_mean 0.54 0.66 0.37 0.36 0.00
enjoyment_mean 0.00 0.00 0.00 0.00 0.14
identification_mean 0.90 0.99 0.05 0.09 0.06
sco_opinion_mean enjoyment_mean identification_mean
sc1_mean 1 0.03 1.00
sc2_mean 1 0.02 1.00
int1_mean 1 0.00 0.65
int2_mean 1 0.00 1.00
sco_ability_mean 0 1.00 0.74
sco_opinion_mean 0 0.06 0.01
enjoyment_mean 0 0.00 0.21
identification_mean 0 0.01 0.00
To see confidence intervals of the correlations, print with the short=FALSE option
# Die corr.test()-Funktion hat keinen "eingebauten" Gruppenvergleich
# Verschiedene andere Funktionen können diese Lücke aber schließen
# Mit der subset()-Funktion kann ich einen Teil der Fälle ausschließen
corr.test(subset(mydata_scales, sozpos == "low social position")[,cor_vars])Call:corr.test(x = subset(mydata_scales, sozpos == "low social position")[,
cor_vars])
Correlation matrix
sc1_mean sc2_mean int1_mean int2_mean sco_ability_mean
sc1_mean 1.00 0.84 0.39 0.38 0.25
sc2_mean 0.84 1.00 0.48 0.46 0.22
int1_mean 0.39 0.48 1.00 0.90 0.04
int2_mean 0.38 0.46 0.90 1.00 -0.01
sco_ability_mean 0.25 0.22 0.04 -0.01 1.00
sco_opinion_mean -0.04 -0.03 0.04 0.13 0.39
enjoyment_mean 0.27 0.29 0.66 0.66 0.03
identification_mean -0.10 -0.09 0.15 0.10 0.17
sco_opinion_mean enjoyment_mean identification_mean
sc1_mean -0.04 0.27 -0.10
sc2_mean -0.03 0.29 -0.09
int1_mean 0.04 0.66 0.15
int2_mean 0.13 0.66 0.10
sco_ability_mean 0.39 0.03 0.17
sco_opinion_mean 1.00 0.12 0.25
enjoyment_mean 0.12 1.00 0.13
identification_mean 0.25 0.13 1.00
Sample Size
[1] 82
Probability values (Entries above the diagonal are adjusted for multiple tests.)
sc1_mean sc2_mean int1_mean int2_mean sco_ability_mean
sc1_mean 0.00 0.00 0.01 0.01 0.39
sc2_mean 0.00 0.00 0.00 0.00 0.67
int1_mean 0.00 0.00 0.00 0.00 1.00
int2_mean 0.00 0.00 0.00 0.00 1.00
sco_ability_mean 0.02 0.04 0.73 0.90 0.00
sco_opinion_mean 0.69 0.79 0.74 0.26 0.00
enjoyment_mean 0.01 0.01 0.00 0.00 0.81
identification_mean 0.36 0.41 0.19 0.38 0.13
sco_opinion_mean enjoyment_mean identification_mean
sc1_mean 1.00 0.24 1.00
sc2_mean 1.00 0.16 1.00
int1_mean 1.00 0.00 1.00
int2_mean 1.00 0.00 1.00
sco_ability_mean 0.01 1.00 1.00
sco_opinion_mean 0.00 1.00 0.41
enjoyment_mean 0.30 0.00 1.00
identification_mean 0.03 0.23 0.00
To see confidence intervals of the correlations, print with the short=FALSE option
# Die filter()-Funktion wird etwas anders verwendet,
# das Ergebnis ist aber das gleiche wie bei der subset()-Funktion.
# Beide Funktionen stehen für zwei unterschiedliche Programmierlogiken,
# die in R parallel bestehen und je nach Anlass ausgewählt werden können.
mydata_scales |>
filter(sozpos == "high social position") |>
select(cor_vars) |>
corr.test()Call:corr.test(x = select(filter(mydata_scales, sozpos == "high social position"),
cor_vars))
Correlation matrix
sc1_mean sc2_mean int1_mean int2_mean sco_ability_mean
sc1_mean 1.00 0.90 0.46 0.43 -0.18
sc2_mean 0.90 1.00 0.48 0.49 -0.23
int1_mean 0.46 0.48 1.00 0.90 -0.05
int2_mean 0.43 0.49 0.90 1.00 -0.16
sco_ability_mean -0.18 -0.23 -0.05 -0.16 1.00
sco_opinion_mean -0.05 -0.05 0.11 0.02 0.56
enjoyment_mean 0.20 0.19 0.49 0.50 0.18
identification_mean 0.10 0.15 0.15 0.17 0.14
sco_opinion_mean enjoyment_mean identification_mean
sc1_mean -0.05 0.20 0.10
sc2_mean -0.05 0.19 0.15
int1_mean 0.11 0.49 0.15
int2_mean 0.02 0.50 0.17
sco_ability_mean 0.56 0.18 0.14
sco_opinion_mean 1.00 0.31 0.29
enjoyment_mean 0.31 1.00 0.27
identification_mean 0.29 0.27 1.00
Sample Size
sc1_mean sc2_mean int1_mean int2_mean sco_ability_mean
sc1_mean 84 84 84 84 84
sc2_mean 84 84 84 84 84
int1_mean 84 84 84 84 84
int2_mean 84 84 84 84 84
sco_ability_mean 84 84 84 84 84
sco_opinion_mean 84 84 84 84 84
enjoyment_mean 83 83 83 83 83
identification_mean 84 84 84 84 84
sco_opinion_mean enjoyment_mean identification_mean
sc1_mean 84 83 84
sc2_mean 84 83 84
int1_mean 84 83 84
int2_mean 84 83 84
sco_ability_mean 84 83 84
sco_opinion_mean 84 83 84
enjoyment_mean 83 83 83
identification_mean 84 83 84
Probability values (Entries above the diagonal are adjusted for multiple tests.)
sc1_mean sc2_mean int1_mean int2_mean sco_ability_mean
sc1_mean 0.00 0.00 0.00 0.00 1.00
sc2_mean 0.00 0.00 0.00 0.00 0.56
int1_mean 0.00 0.00 0.00 0.00 1.00
int2_mean 0.00 0.00 0.00 0.00 1.00
sco_ability_mean 0.09 0.04 0.64 0.15 0.00
sco_opinion_mean 0.63 0.64 0.34 0.84 0.00
enjoyment_mean 0.06 0.09 0.00 0.00 0.10
identification_mean 0.35 0.17 0.17 0.11 0.20
sco_opinion_mean enjoyment_mean identification_mean
sc1_mean 1.00 0.95 1.00
sc2_mean 1.00 1.00 1.00
int1_mean 1.00 0.00 1.00
int2_mean 1.00 0.00 1.00
sco_ability_mean 0.00 1.00 1.00
sco_opinion_mean 0.00 0.09 0.15
enjoyment_mean 0.00 0.00 0.23
identification_mean 0.01 0.01 0.00
To see confidence intervals of the correlations, print with the short=FALSE option
# Zusammenhänge können natürlich auch per Regressionsanalyse
# betrachtet werden. Hier wurde der Zusammenhang zwischen dem Selbstkonzept
# zum ersten und zweiten Messzeitpunkt mit der lm()-Funktion berechnet
lm(sc2_mean ~ sc1_mean, data = mydata_scales)
Call:
lm(formula = sc2_mean ~ sc1_mean, data = mydata_scales)
Coefficients:
(Intercept) sc1_mean
0.3185 0.8802
# Die summary()-Funktion liefert etwas ausführlichere Ergebnisse
summary(lm(sc2_mean ~ sc1_mean, data = mydata_scales))
Call:
lm(formula = sc2_mean ~ sc1_mean, data = mydata_scales)
Residuals:
Min 1Q Median 3Q Max
-3.3794 -0.3616 0.0906 0.3107 1.9212
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.31849 0.18028 1.767 0.0792 .
sc1_mean 0.88016 0.04113 21.400 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.6588 on 164 degrees of freedom
Multiple R-squared: 0.7363, Adjusted R-squared: 0.7347
F-statistic: 457.9 on 1 and 164 DF, p-value: < 2.2e-16
# Man kann sich auch einzelne Elemente aus der Ausgabe anzeigen lassen
summary(lm(sc2_mean ~ sc1_mean, data = mydata_scales))[["coefficients"]] Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.3184855 0.18028385 1.766578 7.915918e-02
sc1_mean 0.8801597 0.04112954 21.399697 2.447987e-49
# Das Intercept ist immer der vorhergesagte Wert für das Kriterium (hier sc2_mean)
# wenn alle Prädiktoren (hier sc1_mean) den Wert 0 haben. Jeder Erhöhung um einen Punkt
# in der Prädiktorvariable erhöht den vorhergesagten Wert für das Kriterium um das zugehörige
# b-Gewicht (Estimate)
# Das Modell sagt also aus:
# Eine Person mit einem Selbstkonzept von 0 im Pre-Test würde das Modell einen Post-Test Wert
# von (Intercept)
round(summary(lm(sc2_mean ~ sc1_mean, data = mydata_scales))[["coefficients"]][1,"Estimate"], 2)[1] 0.32
# vorhersagen. Einer Person mit einem Pre-Test Selbstkonzept von 2 würde das Modell
# einen Post-Test Wert von (Intercept + 2 * Regressionsgewicht sc1_mean)
round(summary(lm(sc2_mean ~ sc1_mean, data = mydata_scales))[["coefficients"]][1,"Estimate"] + 2 * summary(lm(sc2_mean ~ sc1_mean, data = mydata_scales))[["coefficients"]][2,"Estimate"], 2)[1] 2.08
# Aus dem Regressionsmodell lässt sich auch die Korrelation ableiten.
# Zumindest im einfachsten Fall mit einem Kriterium und einem Prädiktor ist die
# Korrelation r die Wurzel aus dem R²
sqrt(summary(lm(sc2_mean ~ sc1_mean, data = mydata_scales))[["r.squared"]])[1] 0.8580862
Call:corr.test(x = mydata_scales[, c("sc1_mean", "sc2_mean")])
Correlation matrix
sc1_mean sc2_mean
sc1_mean 1.00 0.86
sc2_mean 0.86 1.00
Sample Size
[1] 166
Probability values (Entries above the diagonal are adjusted for multiple tests.)
sc1_mean sc2_mean
sc1_mean 0 0
sc2_mean 0 0
To see confidence intervals of the correlations, print with the short=FALSE option
# Wir könnten jetzt z.B. überprüfen, ob die vor der Intervention erhobenen Werte
# für Selbstkonzept und Interesse sich zwischen den Untersuchungsbedingungen unterscheiden
# Unterschiede in den Untersuchungsbedingungen im Selbstkonzept
# (Pre-Test)
t.test(sc1_mean ~ sozpos, data = mydata_scales)
Welch Two Sample t-test
data: sc1_mean by sozpos
t = -0.8296, df = 163.7, p-value = 0.408
alternative hypothesis: true difference in means between group low social position and group high social position is not equal to 0
95 percent confidence interval:
-0.5434798 0.2219060
sample estimates:
mean in group low social position mean in group high social position
4.121951 4.282738
# Unterschiede in den Untersuchungsbedingungen im Interesse
# (Pre-Test)
t.test(int1_mean ~ sozpos, data = mydata_scales)
Welch Two Sample t-test
data: int1_mean by sozpos
t = 0.73648, df = 162.56, p-value = 0.4625
alternative hypothesis: true difference in means between group low social position and group high social position is not equal to 0
95 percent confidence interval:
-0.2803268 0.6138053
sample estimates:
mean in group low social position mean in group high social position
4.378049 4.211310
# Das gleiche kann ich natürlich auch mit einer Regressionsanalyse berechnen
summary(lm(sc1_mean ~ sozpos, data = mydata_scales))
Call:
lm(formula = sc1_mean ~ sozpos, data = mydata_scales)
Residuals:
Min 1Q Median 3Q Max
-2.78274 -1.03274 -0.03274 0.96726 2.87805
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.1220 0.1378 29.90 <2e-16 ***
sozposhigh social position 0.1608 0.1938 0.83 0.408
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.248 on 164 degrees of freedom
Multiple R-squared: 0.004181, Adjusted R-squared: -0.001891
F-statistic: 0.6885 on 1 and 164 DF, p-value: 0.4079
high social position
low social position 0
high social position 1
# Helmert-Kontraste wären eine andere Option
# Hier ist das Intercept der Mittelwert zwischen beiden Bedingungen
contrasts(mydata_scales[,"sozpos"]) <- c(-1,1)
summary(lm(sc1_mean ~ sozpos, data = mydata_scales))
Call:
lm(formula = sc1_mean ~ sozpos, data = mydata_scales)
Residuals:
Min 1Q Median 3Q Max
-2.78274 -1.03274 -0.03274 0.96726 2.87805
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.20234 0.09688 43.38 <2e-16 ***
sozpos1 0.08039 0.09688 0.83 0.408
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.248 on 164 degrees of freedom
Multiple R-squared: 0.004181, Adjusted R-squared: -0.001891
F-statistic: 0.6885 on 1 and 164 DF, p-value: 0.4079
weight group
1 4.17 ctrl
2 5.58 ctrl
3 5.18 ctrl
4 6.11 ctrl
5 4.50 ctrl
6 4.61 ctrl
7 5.17 ctrl
8 4.53 ctrl
9 5.33 ctrl
10 5.14 ctrl
11 4.81 trt1
12 4.17 trt1
13 4.41 trt1
14 3.59 trt1
15 5.87 trt1
16 3.83 trt1
17 6.03 trt1
18 4.89 trt1
19 4.32 trt1
20 4.69 trt1
21 6.31 trt2
22 5.12 trt2
23 5.54 trt2
24 5.50 trt2
25 5.37 trt2
26 5.29 trt2
27 4.92 trt2
28 6.15 trt2
29 5.80 trt2
30 5.26 trt2
## Aufgabe 2:
# Bestimmt das Durchschnittsgewicht der Gesamtgruppe und
# der drei Untersuchungsgruppen.
# Lösung describeBy():
describeBy(PlantGrowth ~ group)
Descriptive statistics by group
group: ctrl
vars n mean sd median trimmed mad min max range skew kurtosis se
weight 1 10 5.03 0.58 5.15 5 0.72 4.17 6.11 1.94 0.23 -1.12 0.18
group 2 10 1.00 0.00 1.00 1 0.00 1.00 1.00 0.00 NaN NaN 0.00
------------------------------------------------------------
group: trt1
vars n mean sd median trimmed mad min max range skew kurtosis se
weight 1 10 4.66 0.79 4.55 4.62 0.53 3.59 6.03 2.44 0.47 -1.1 0.25
group 2 10 2.00 0.00 2.00 2.00 0.00 2.00 2.00 0.00 NaN NaN 0.00
------------------------------------------------------------
group: trt2
vars n mean sd median trimmed mad min max range skew kurtosis se
weight 1 10 5.53 0.44 5.44 5.5 0.36 4.92 6.31 1.39 0.48 -1.16 0.14
group 2 10 3.00 0.00 3.00 3.0 0.00 3.00 3.00 0.00 NaN NaN 0.00
Descriptive statistics by group
group: ctrl
vars n mean sd median trimmed mad min max range skew kurtosis se
weight 1 10 5.03 0.58 5.15 5 0.72 4.17 6.11 1.94 0.23 -1.12 0.18
------------------------------------------------------------
group: trt1
vars n mean sd median trimmed mad min max range skew kurtosis se
weight 1 10 4.66 0.79 4.55 4.62 0.53 3.59 6.03 2.44 0.47 -1.1 0.25
------------------------------------------------------------
group: trt2
vars n mean sd median trimmed mad min max range skew kurtosis se
weight 1 10 5.53 0.44 5.44 5.5 0.36 4.92 6.31 1.39 0.48 -1.16 0.14
## Aufgabe 3:
# Berechnet eine Regressionsanalyse mit weight als Kriterium und
# group als Gruppenvariable
# Was könnt ihr anhand der Regressionsgewichte und
# der Ergebnisse aus Aufgabe 2 über die Kontraste sagen?
# Lösung:
summary(lm(weight ~ group, data = PlantGrowth))
Call:
lm(formula = weight ~ group, data = PlantGrowth)
Residuals:
Min 1Q Median 3Q Max
-1.0710 -0.4180 -0.0060 0.2627 1.3690
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.0320 0.1971 25.527 <2e-16 ***
grouptrt1 -0.3710 0.2788 -1.331 0.1944
grouptrt2 0.4940 0.2788 1.772 0.0877 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.6234 on 27 degrees of freedom
Multiple R-squared: 0.2641, Adjusted R-squared: 0.2096
F-statistic: 4.846 on 2 and 27 DF, p-value: 0.01591